Manuzio: An Object Language for Annotated Text Collections
نویسندگان
چکیده
Traditionally, text collections are represented as text files with some kind of markup to define extra-textual information, like metadata, annotations, etc. We propose an approach which uses the natural structure of a literary text to build specialized objects abstractions on text collections, objects which can be used to make non-hierarchically nested, multi-level annotations, to create complex metadata, and to perform complex queries and analysis on the collection. The language Manuzio is the result of this approach, and in this paper we introduce its main features, as well as the sketch of a system, based on the language, to manage persistent text collections and write complex applications over them.
منابع مشابه
A Model and a Language for Representing and Manipulating Annotated Text Collections
Traditionally, collections of texts are digitally represented as a set of documents containing the text along with some kind of markup to define extra information, like metadata, annotations, etc. We propose a different approach that models the textual information in a dual way: as a formatted sequence of characters, as well as a composition of a particular kind of objects, called textual objec...
متن کاملDeriving a Priori Co-occurrence Probability Estimates for Object Recognition from Social Networks and Text Processing
Certain components in images can be recognized with high accuracy, for example, backgrounds such as leaves, grass, snow, sky, water. These components provide the human eye with context for identifying items in the foreground. Likewise for the machine, the identification of background should help in the recognition of foreground objects. But, in this case, the computer needs explicit lists of ob...
متن کاملTranslating Images to Words for Recognizing Objects in Large Image and Video Collections
We present a new approach to the object recognition problem, motivated by the recent availability of large annotated image and video collections. This approach considers object recognition as the translation of visual elements to words, similar to the translation of text from one language to another. The visual elements represented in feature space are categorized into a finite set of blobs. Th...
متن کاملCompilation of a Mexican Spanish text corpora
-Collections of texts with syntactic annotation are nowadays useful resources. They are employed for diverse tasks in theoretical research and natural language applications. The most important collections are dedicated to English. But huge efforts have being realized to develop the corresponding to other languages. In this work we present the initial steps for the compilation of a Mexican Spani...
متن کاملObject-based Annotations for Discovery and Collaboration
This paper discusses a design for object-based interaction and manipulation for annotating a text discovery application. Rather than attaching annotations to the interface or directly annotating the interface, objects from the interface can be directly annotated and copied in to a collection to be viewed outside the context of the main interface. Objects are smaller chunks of the interface whic...
متن کامل